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ABSTRACT: The analysis of visual motion divides naturally into two stages: the 
first is the measurement of motion, for example, the assignment of direction and 
magnitude of velocity to elements in the image, on the basis of the changing intensity 
pattern; the second is the use of motion measurements, for example, to separate the 
scene into distinct objects, and infer their three-dimensional structure. In this paper, 
we present a computational study of the measurement of motion. Similar to other 
visual processes, the motion of elements is not determined uniquely by information 
in the changing image; additional constraint is required to compute a unique velocity 
field. Given this global ambiguity of motion, local measurements from the changing 
image, such as those provided by directionally-selective simple cells in primate 
visual cortex, cannot possibly specify a unique local velocity vector, and in fact, 
specify only one component of velocity. Computation of the full two-dimensional 
velocity field requires the integration of local motion measurements, either over 
an area, or along contours in the image. We will examine possible algorithms for 
computing motion, based on a range of additional constraints. Finally, we will 
present implications for the biological computation of motion. 
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1. Introduction 


The organization of movement in a changing two-dimensional image provides a 
valuable source of information for analyzing the environment in terms of objects 
their motion in space, and their three-dimensional structure. It is not surprizing' 
therefore, that the analysis of visual motion plays a central role in biological vision 
systems. Sophisticated mechanisms for extracting and utilizing motion exist even in 
simple animals. For example, the frog has efficient "bug detection” mechanisms that 
respond selectively to small, dark objects moving in its visual field ll. The ordinary 
housefly can track moving objects and discover the relative motion between a target 
and its background, even when the two are identical in texture, and therefore 
indistinguishable in the absence of relative motion [2]. 

In higher animals, including primates, the analysis of motion is ’’wired into” 
the visual system from the earliest processing stages. Some species, such as the 
pigeon [3j and rabbit 4] (see [5] for other examples) perform rudimentary motion 
analysis at the retinal level. In other animals, including cats and primates, the first 
neurons in visual cortex to receive input from the eyes are already involved in the 
analysis of motion: they respond well to stimuli moving in one direction, but little 
or not at all, to motion in the opposite direction [6,7], 

In some animals, visual motion is used in the guidance of locomotion and 
the control of body motion. The plummeting gannet [8], for example, uses visual 
How information to stretch back its wings a fraction of a second before it hits 
the water. Perhaps the most remarkable use of visual motion is the recovery 
of three-dimensional shape using motion information alone. This capacity of the 
human visual system has been demonstrated in the studies of Wallach and O’Connell 
[9J and Johansson [10,11]. 

The extensive use of motion by biological systems, and in particular the human 
visual system, demonstrates the feasibility of carrying out certain information 
processing tasks and helps to establish specific goals for the analysis of time-varying 
imagery. This analysis divides naturally into two parts. The first stage is the 
measurement of motion; for example, the assignment of direction and magnitude 
of velocity to elements in the image, on the basis of the changing intensity pattern. 

. he second is the use of motion measurements; for example, to separate the scene 
into distinct objects, and infer their three-dimensional structure. 

In this paper, we present a computational study of the measurement of visual 
motion. It is a problem which was found to be surprizingly difficult, both in 
computer vision, and in modelling biological vision systems. We will present the 
general problem of motion measurement in Section 2, and discuss methods that 
have been proposed for its solution. Section 3 presents a specific scheme, proposed 
by Marr and Ullman [12], for extracting the first motion measurements from the 
changing image. The initial measurements do not yet specify the true motion of 
objects in the changing image, and must be combined in some way. This raises the 
motion integration problem, which will be discussed in Sections 3 and 4. Section 
5 presents some implications for the analysis of motion in biological vision systems. 


2. Motion Detection and Measurement 


The motion of elements and regions in an image is not given directly, but must 
be computed from more elementary measurements. The initial registration of light 
by the eye or by electronic imaging devices can be described as producing a 
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two-dimensional array of time-dependent light intensity values, I(x,y,t). Motion in 
the image can be described in terms of a vector field V(z, y, t) giving the velocity 
of a point with image coordinates (x, y ) at time t. The first problem in analyzing 
visual motion is the computation of V(x,y,t) from I(x,y,t). This computation is 
the measurement of visual motion. 


i cases ^ ma y be sufficient to detect only certain properties of the 

velocity held, rather than measure it completely and precisely. For example, in order 
to respond quickly to a moving object, motion must be detected, but not necessarily 
measured. Other tasks, such as the recovery of three-dimensional structure from 

17 ?’ rec bb re a more complete and accurate measurement of the velocity field 
[13-17J. 


The measurement of motion may be performed at different stages in the 
processing of an image, utilizing different motion primitives. It is useful to draw a 
distinction between two main schemes. At the lowest level, motion measurements 
ma y j e based directly on the local changes in light intensity values; these are 
called intensity-based schemes. Alternatively, it is possible to first identify features 
such as edges and their termination points, corners, blobs, or regions, and then 
measure motion by matching these features over time, and detecting their changing 
positions. Schemes of this type are called token-matching schemes. These two 
modes of motion detection and measurement give rise to different computational 
problems, and consequently to different kinds of processes in biological as well as 
computer vision systems. 


2.1. Intensity-based Schemes for Motion Measurement 

Two main types of intensity-based schemes have been advanced for biological 
and computer vision systems: correlation techniques and gradient methods. Cross- 
f 1 ° 0 rr n e f 1 ° n J °f ra ^ intensit y values has been used in computer vision applications 
[lo-21j, and has been proposed as a model for motion measurement in the human 
visual system [22-24], Related to cross-correlation schemes are subtraction schemes 
involving simple differencing operations between successive frames. In computer 
vision, such schemes are primarily used for the detection of motion, and object 
segmentation [25-28]; together with cross-correlation, they have been utilized for 
the measurement of motion [26,28], A fundamental problem of most correlation 
and subtraction schemes is that they assume the image (or a large portion of it) 
moves as a whole between the two frames. Images containing independently moving 
objects and image distortions induced by the unrestricted motion of objects in 
space pose difficult, perhaps insurmountable, problems for these techniques. 

Other intensity-based schemes have been proposed for biological systems. A 
simple motion detector can be constructed by comparing the outputs of two 
detectors to light increments at two adjacent positions. The output at position pi 
and time t is compared with that at p 2 at time t — 6t (a low-pass temporal filter 
may be used instead of the delay [29]). Two variations of this approach, called the 
delayed comparison scheme, have been proposed as models for biological systems 
the first is obtained by multiplying the two values, i.e. D(p u t)-D(p 2 ,t — 6t) where 
D denotes the output of the subunits, shown in Figure la. If a point of light moves 
*r°m P 2 to pi in time equal to 6t, this product will be positive. In an array of such 
detectors, the average output is essentially equivalent to a cross-correlation of the 
inputs [29]. An alternative method along the same general line is the ”And-Not” 
scheme proposed by Barlow and Levick [30] for the directionally selective units in 
the rabbit s retina (a similar scheme was suggested for the cat’s visual cortex [31]). 
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(a) 


(b) 


* 3 rure L , The delayed comparison schemes, (a) The two inputs are multiplied 
(b) The ”And-Not” scheme 


Evidence for inhibitory interactions within the directionally selective mechanism 
!ed to a model in which the motion detector computes the logical ’’And” of ZXo, t) 
and Not of D(p 2 t — St) (see Figure lb). In this scheme, a motion from p 2 to’ Pl 
is vetoed by a delayed response from p 2> whereas motion from p x to p 2 produces 
a positive response. Poggio and Reichardt [32] have proposed a similar scheme for 
the visual system of the fly, and an elegant synaptic mechanism that implements 
these computations was described by Torre and Poggio [33]. 


Some general properties of the delayed comparison schemes are worth noting 
f irst, these detectors respond selectively not only to continuous motion, but also 
to discrete jumps of the stimulus between positions p x and p 2 . Second, the speed of 
motion must lie within a certain range, determined by the delay (or the low-pass 
filtering) and the separation between the receptors. A range of velocities can be 
covered by a family of detectors with different internal delays and interreceptor 
spacing. Finally, motion measurements cannot be determined reliably from the 
output of a single detector of this type. The accurate and reliable measurement of 

motion will require the combination of the outputs from an array of such elementary 
detectors. 


In gradient schemes, the local motion measurements are derived via a comparison 
between intensity gradients, and temporal intensity changes. A one-dimensional 
example, illustrating the basic principle, is shown in Figure 2. Consider the intensity 
profile (intensity I as a function of position x), indicated by the solid curve in Figure 
2. At the point p, the profile has a positive slope. If the profile moves to the left, 
indicated by the dashed curve, the intensity value I at p will be increasing; for a 
rightward motion, indicated by the dotted and dashed curve, lip) will be decreasing, 
lhe sign of the temporal change in I(p) thus signals the direction of motion, and 
Irom the magnitude of the spatial and temporal intensity changes, the speed of 
motion can be determined. In principle, measurements of motion may be obtained 
wherever the image intensity gradient is non-zero; however, the measurements are 
more reliable at the location of edges, where steep intensity gradients are induced. 
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Figure 2. Comparison of the sign of the spatial and temporal derivatives of intensity 
at the point p yields the sign of direction of motion 

In two dimensions, the spatial and temporal intensity changes alone are not 
sufficient to determine the local direction and magnitude of velocity [12 34-371 
because of the aperture problem, illustrated in Figure 3. If the motion of the 
edge E is to be detected by operations which examine an area A that is small 
compared to the overall extent of the edge, the only motion that can be extracted 
is the component c perpendicular to the local orientation of the edge. For example 
\ °P er ^ ons ca nnot distinguish between motion in the directions b, c, and d. 
To determine the motion completely, a second stage of analysis is required, which 
integrates the local motion measurements, either over an area of the image or 
along contours. & ’ 



Fi g ure 3 - The aperture problem. Motion in the directions b, c and d can not be 
distinguished when viewed through the local aperture A. 

2.2. Token-matching Schemes for Motion Measurement 

In token-matching schemes, identifiable elements - tokens - are located and then 
matched over time. Assuming that the visual input is given as a sequence of 
discrete frames, a counterpart for each element in one frame must be located in 
the next. This raises the correspondence problem, illustrated in Figure 4. The 
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filled circles in the figure represent the first frame, and the open circles the second 
There are two possible one-to-one pairings between the elements of the two frames 
eadmg to two patterns of perceived motion: diagonal (a) or horizontal (b). in 
11 s example, the match is only two-way ambiguous. In general, each frame could 

Z .T^f Y A elements arran ged m complex figures; a correspondence must then 
be established among them. The rules governing the correspondence process in 
human vsion have been investigated [38-44], but are still far from being completely 
nderstood. Token-matching schemes for motion measurement have also been 
studied for computer vision [45-50]. 


•--o 
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(a) 

Figure 4. A simple correspondence problem 


(b) 


Two general problems of token-matching schemes are relevant to both 
biological and machine motion analysis. The first concerns the level at which 

Ind C t°hTeo 0n r C f 18 r es . t f blished -% .this ^ mean the degree of preprocessing 
nd the complexity of the participating tokens. Matching may be established 

ibp W m ei J v mP tokens such as P° Ints - blobs, and edge fragments. Alternatively, 
e „“ l , hmg process f ma > r °P erate , complex tokens such as structured forms 
or even the images of recognized objects. The use of complex tokens can simplify 
the correspondence process, since a complex token will usually have a unique 
counterpart in a subsequent frame. Primitive tokens will usually have many 
competing possible matches, but their use has two distinct advantages The first 
is a reduced preprocessing requirement, which is of special importance in motion 
perception, where computation time is severely restricted. The second is that a 
correspondence scheme based on primitive tokens can operate on arbitrary objects 
engaged in complex shape changes. It seems, therefore, that the correspondence 
process should operate on the level of rather primitive elements, perhaps at the 
level of Marr’s full primal sketch [51,52]. P 

f e f ond S eneral Problem concerns the possible role of intensity-based and 
Sch , emes .! n an integrated vision system. Intensity-based schemes 
be J as t f d sensitl ye, but the ambiguity of the local measurements may 
in n dlffiCU ? rec ° ver the velocity field accurately. A token-matching scheme 
n, in principle, track a sharply localized token (such as a line termination) over 
bng d,stances, and thereby achieve a high degree of accuracy, at the price of 

prob e iem tenS1Ve pr0cessmg ’ “ locatln S the tokens and solving the correspondence 

In light of the differences in their basic properties, it is possible that the two 
motion measurement schemes serve distinct visual tasks. The intensity-based system 
may be useful as an early warning” system, and for the separation of moving 
objects from their background. Token-matching schemes may play an important 

ZnnJS th Ki re $°T ery ° f struct r ur . e from motion, where the accurate tracking over 
considerable distances is useful. A second possiblility is that the two schemes 

be mndUYv°T e t ment i each ° the J- For example, a token-matching scheme might 
be guided by additional constraints supplied by an intensity-based system 
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2.3. Two Motion Systems in Human Vision 

Psychological studies of motion detection and measurement in the human system 
have distinguished two types of visual motion: discrete and continuous. For human 
observers to perceive motion, the stimulus need not move continuously across the 
visual field. Under the appropriate spatial and temporal presentation parameters a 
stimulus presented sequentially can produce the impression of smooth, uninterrupted 
motion (as in motion pictures) [53]. The visual system can ”fill-in” the gaps in the 
discrete presentation even when the stimuli are separated by up to several degrees 
of visual angle, and by long temporal intervals (400 msec., 54]). The resulting 
motion, termed apparent’ or ’’beta” motion is perceptually inc istinguishable from 
continuous motion. 

The apparent motion phenomena raise the question of whether discrete and 
continuous motion are registered by two different mechanisms. The fact that the 
visual system can register both types of motion does not imply the existence of 
two separate mechanisms, since a system that registers discrete motion could in 
principle register continuous motion as well. Psychophysical evidence supports, 
however, the view that two different mechanisms are in fact involved in the process 
of motion detection and measurement [55-60], The terms ’’short range” and ’’long 
range processes were suggested by Braddick [58] for the two mechanisms. The 
short range mechanism registers continuous motion, or motion presented discretely, 
with displacements of up to about 15 min. of arc (in the center of the visual 
field) and temporal intervals of less than about 60-100 msec. The long range 
mechanism can process larger displacements and temporal intervals. Braddick’s 

^k rn j^ n °^°^/ C ^ iar - aC ^ ei '^ ZeS - ^ ie ( ^ s ^ nc ti°n between the two mechanisms better than 
the discrete/continuous dichotomy, since discrete presentation with jumps of up to 
15 min. of visual arc will be processed by the short range mechanism. 

In the human visual system, it appears that the short range process is an 
mtensity-tmsed scheme, whereas the long range process is a token-matching scheme. 
Braddick ]58] proposed that the directionally-selective units of visual cortex underly 
the short range process, suggesting that the spatial and temporal limits reflect 
the spatial and temporal parameters of these neural units. Marr and Ullman 
[12], present a gradient scheme for the detection and measurement of motion, 
which includes a model for constructing the directionally-selective units, and an 
algorithm for combining the local measurements to compute the two-dimensional 
velocity field. The long range motion phenomena illustrate our ability to derive a 
correspondence of elements in the changing image, over considerable distances and 
temporal intervals. In these situations, there is no continuous motion of elements 
across the retina to be measured directly. Psychophysical studies have shown the 
long range correspondence to be based on more symbolic primitives, such as edges 
bars, blobs, simple groups of primitive elements, and texture edges [13,61]. 

2.4. Summary 

To summarize, several methods are available for the detection and measurement 
of motion. These methods differ in the constraints they derive from the changing 
image. Intensity-based schemes utilize the spatial and temporal changes in the image 
intensity pattern to constrain local velocity, while token-matching schemes extract 
more symbolic tokens from the image, which are then matched over time. These 
two techniques for motion analysis give rise to different computational problems, 
and consequently to different kinds of processes in biological and computer vision 
systems. 




3. Deriving Velocity Constraints from the Image 

In this section, we first present a scheme for extracting initial motion constraints 
from the image, proposed by Marr and Ullman [12], which was motivated by 
computational studies of early visual processing, and neurophysiological studies 
of direction ally-selective simple cells in primate visual cortex. The use of this 
type of initial motion measurement raises the motion integration problem ; the 
measurements do not yet specify the true motion of objects in the changing image, 
and must be integrated in some way to compute the velocity field. Computational 
studies suggested that the first stage of image analysis should be the detection 
of intensity changes (see [62] for a review). Marr and Hildreth [63] have proposed 
that an optimal operator for the initial filtering of the image is the Laplacian 
of a Gaussian, V 2 G, whose shape may be approximated by the difference of 
two Gaussians. The elements in this convolution output, which correspond to the 
location of intensity changes, are the zero-crossings [64], Figure 5 show's an image 
which has been processed through a V 2 G filter, and the resulting zero-crossing 
contours. Marr and Hildreth suggested that the convolution of the image with V 2 G 
is represented in the output of the retinal ganglion X-cells, and that a class of 
simple cells in visual cortex assumes the role of zero-crossing defection. 



(a) (b) ( c ) 

Figure 5. The detection of intensity changes, (a) The original image (b) The convolution 
of (a) with a V 2 G operator (c) The resulting zero-crossing contours. 

Marr and Ullman [12] have extended this model for simple cells, including a 
mechanism for their directional selectivity. The basic idea is illustrated in Figure 
6. Figure 6a shows the one-dimensional output of the convolution of a step-edge 
intensity profile, with the second derivative of a gaussian, ( D 2 G*I ). Figure 6b and 
Figure 6c illustrate the time derivative, §- t {D 2 G*I ), for motion of the profile to 
the left and right, respectively. At the location of the zero-crossing Z, the time 
derivative will be negative for motion to the left, and positive for motion to the 
right. Similar to the gradient scheme introduced in Section 2, the sign of contrast 
of the zero-crossing can be compared with the sign of the temporal derivative, to 
compute the direction of motion of the zero-crossing. By combining the magnitude 
of the slope of the convolution output as it crosses zero, with the magnitude of the 
time derivative, rough magnitude of velocity can be computed. In two dimensions, 
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comparison of the spatial and temporal derivatives of V 2 G*/ (where / is now a 
two-dimensional intensity distribution) at the location of zero-crossings, provides 

only the component of motion m the direction perpendicular to the local orientation 
oi the contour. 

Marr and Ullman have proposed that the retinal ganglion Y-cells carry the 
time derivative of the V G convolution, and that simple cells combine the spatial 
and temporal derivatives, carried by the X-system and Y-system (via the LGN) 
to compute the direction of motion of the zero-crossing contours. A neural model 
for the derivation of the spatial and temporal derivatives has been proposed by 
Richter and Ullma.n [65], Recent neurophysiological studies support the role of 
S ’™,P C cells m the detection of zero-crossings [Richter, personal communication!. In 
addition to neurophysiological support, this scheme appears to be consistent with 
psychophysical studies of the short-range process [12], 



6 ~ - The Marr-Ullman scheme, (a) Convolution of a step intensity change with 

D 2 G (b) and (c) Temporal intensity derivative for motion of the profile to the left and 
right 


, From a computational standpoint, restricting the measurement of motion to 
the location of zero-crossings has two advantages over schemes based only on the raw 
intensities. First, the zero-crossings of V 2 G*7 correspond to points in the image at 
which the gradient of intensity is locally maximum, yielding the most reliable local 
velocity measurements. Second, the zero-crossings are tied more closely to physical 
leatures; if the zero-crossings move, it is more likely to be the consequence of 
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movement of an underlying physical surface. There are many factors that can cause 
intensity to change locally, such as changing illumination; a change in intensity 
over time is not necessarily due to the motion of an underlying surface. 


The zero-crossing scheme presented above does not yet solve the motion 
measurement problem. The measurement of the motion of zero-crossings, using 
a local gradient scheme, provides only the component of motion in the direction 
perpendicular to the orientation of the contour. The component of velocity along the 
contour remains undetected More formally, we may express the velocity field along 
a contour by the function V[s), where s denotes arclength. V(s) can be decomposed 
into components tangent and perpendicular to the contour, as illustrated in Figure 
7. u (s) and u-L{s) are unit vectors in the directions tangent and perpendicular to 
the curve, and v' (s) and u-L(s) denote the magnitudes of the two components: 


V(s) = u T (s)u T (s) + v-L(s)u-l-(s) 


( 1 ) 



FlgUrC 7 - The decomposition of velocity V(s) into tangential and perpendicular 
components 

The component v-L (s) is given directly by the initial measurements from the 
changing image; the computation of V(s) requires the further recovery of t)T( s ), 

At the very least, the computation of V(s) requires the integration of the 

mTv^ilfvf Pr T d ! d f by VH*) fjppe. fche contour. In general however, the solution 
y still be underdetermined. Additional constraint is required to compute a single 
velocity field. Figure 8 illustrates two examples, in which the velocity field solution 
is not unique. In Figure 8a, the solid and dotted lines represent the image of a 
moving circle, at different instants of time. In the first frame (solid line), the circle 

rfpntW? * t0 i th f Plan6 ’ Whiie in the second frame, the circle is slanted in 

epth. One velocity field consistent with this sequence is derived from pure rotation 
oi the circle about the central vertical axis, as shown to the left in Figure 8a. 

if rn+tr° WS re Rf ese " t Iocal velocities.) However, there could also be a component 
of rotation in the plane of the circle, about its center; as shown to the right in 
Figure 8a. Both velocity fields correspond to valid rigid motions of the circle 6 This 

7 18 j°a P artlcular t0 circles - In Fi gure 8b, the solid curve C x rotates, 
translates and deforms over time, to yield the dotted curve C 2 . The mapping of 

wlnrff fr °f m Hi iess clear ( consider > for example, different possible 

velocities for the point p). The precise computation of the velocity field in this case 
is important, when one considers the subsequent computation of structure from 




(a) (b) 

F 'g ure 8 - Ambiguity of the velocity field computation, (a) A circle, rotating in depth 
(b) A deforming curve 

motion, different choices for the velocity field may yield different three-dimensional 
structures. The computation of a unique velocity field requires additional assump¬ 
tions about physical surfaces, and the velocity field that they generate under 
motion. 

In conclusion, the computation of the velocity field, for the case of general 
motion, requires a scheme that combines local measurements of motion from the 
changing image, subject to additional constraints. This is the motion integration 
problem. 


4. The Integration of Local Motion Measuremeits 

In this section, we discuss the motion integration problem, strictly from a 
computational viewpoint. The results that we present here are largely independent 
of the nature of the initial motion measurements, and in particular, do not depend 
on the Marr-Ullman scheme discussed previously. This section will be organized by 
the type of additional constraint that may be utilized in the combination stage. We 
will consider four types of additional constraint on the velocity field: (1) velocity 
is constant over an area of the image (valid for pure translation); (2) the velocity 
field is consistent with rigid rotation and translation of objects in the image plane; 
(3) the velocity field is smooth, and exhibits the least variation among the set 
of velocity fields consistent with the image constraints; and (4) the velocity field 
is smooth, exhibits the least variation possible, and is constant over small time 
intervals. We will discuss methods for combining local measurements, given each of 
the four types of constraint. 

4.1. The Constant Velocity Constraint 

Much of the previous work in motion analysis has addressed the case of pure 
translation of objects in the image plane. The early gradient schemes used in 
computer vision [34,66] assumed that velocity would be constant over a large 
area of the image. Most correlation and subtraction schemes also embodied this 
assumption. Marr and Ullman [12,67] proposed a scheme in which each local 
measurement restricts the true velocity of a patch to lie within a 180° range of 
directions to one side of a segment of the local zero-crossing contour. A set of 
measurements taken at different orientations along a zero-crossing contour then 















further restrict the allowable velocity directions, until a single velocity direction is 
obtained, which is consistent with all the local measurements. 

Th e scheme that we present in the next section, for analyzing rotation and 
translation m the image plane, may also be used for the restricted case of pure 
translation. While these schemes cannot account for the full range of human motion 
perception, they may be useful for the initial detection and rough measurement 
oi motion in the periphery, or analysis of motion during smooth pursuit eye 
movements, m which stationary objects translate rigidly with respect to the eve 
In computer vision, there are restricted applications for these techniques, such as 
the tracking of objects along a conveyor, or computation of camera motion [68]. 

4.2. Rigid Motion in the Image Plane 


In this and remaining sections, we will focus on the motion of contours The results 
apply however, to continuous patches in the image as well. First, suppose we have 
a rigid curve undergoing general motion in space. Its instantaneous motion may 
be described as the combination of: (1) a rotation with angular velocity u about a 

single axis in space, which we will denote by the vector n = [m, n 2 , n 3 | T (T denotes 
the transpose of a vector), and (2) a translation, which we will denote by the vector 

t = Mi’fM 3 ] • Let the curve be given parametrically by C = (xfs), y(s), zfs)) 
The location of a point on the curve may be given by the position vector r == 

[x(s] y(s), 2 (s)] . If we let the optical axis lie along the 2 -axis, and let the projection 
of the curve onto the image plane (the (x, y ) plane) be orthographic, then the 
two-dimensional velocity field V(s) along the contour is given by: 


V(s) = M(r X wn + d) = uz(s)\ " 2 ] + wn 3 [ + ( 2 ) 

[~ni\ x{s) J d 2 \ K 1 

M denotes the matrix which performs the orthographic projection. The first term in 
the resulting expression describes the component of the velocity field due to rotation 

in depth about an axis parallel to the image plane (the axis n = [m, n 2 , 0]the 
second term is the component due to motion in the image plane (rotation about 

the axis n [0,0, n 3 ] ), and the third term is the translation component. 

„ u ^ ons ^ er kbe restricted case of rigid motion in the image plane; the velocity 
held now corresponds to the combination of a translation, and rotation about the 
axis n = [0,0,1] . Thus, V(s) is given by: 


v M = 4 n^l + fj'l (3) 

x(s) J L<* 2 j V 1 

V(s) is simply a translation, rotation and scaling of the image curve fxfs) v(s)) 
as illustrated in Figure 9. In Figure 9a, the curve C x undergoes a small rotation 
and translation in the image plane to yield the curve C 2 . The arrows indicate local 
velocity vectors along the curve. In Figure 9b, these velocity vectors have been 
translated to a common origin in velocity space, where the x and y axes represent 
the x and y components of velocity. The curve in velocity space has the same shape 

nno the i^ age CU3 7f C X’ l tS i size I s P ro P or bional to angular velocity uj, and it is rotated 
yu with respect to C\ (this relationship is also used in kinematics [69]). 


n 




(b) 


The additional translation of the curve in the image yields the same translation of 
the curve m velocity space. In the case of pure translation, the image of the velocity 
held in velocity space degenerates to a single vector. In general, the explicit use of 
the velocity space aids in the visualization of properties of the motion of curves 
and provides a tool for establishing theoretical properties of the velocity field. 

, ^ or the simple case of rigid motion in the image plane, this relationship between 
the shape of the curve and the velocity field is not restricted to continuous motion 
of the curve. For discrete motion of a curve, we will use the term displacement 
field lor the set of vectors which describe the discrete displacement of points on the 
curve. If we let a be a discrete angular rotation of the curve in the image plane 
then the displacement field V d (s) is given by: 


V«(s) = 


COS <7—1 

sin <7 

’*(*)' 

+ 

d 1 

— sin cr 

COS <7 — 1 

,y( s ). 

d-i 


(4) 



. yA s \ a \ s ? &i yen hy a scaling and rotation of the projected image curve 

2 /(s)J. In this case, the scale factor k is given by: 

k = V(cos a — I}* + sin 2 a = ^2(1 — cos a) ( 5 ) 

The angle of rotation a between the image curve (x(s),y(s)) and the 
corresponding curve in velocity space, is given by: 


sin a 

tan a =- 

cos a — 1 


-cot(^) 
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allowed to move freely in space, and deform over time. The specific analysis will 
assume that we have measured the perpendicular components of velocity along 
contours in the image. However, the general constraint that we present may be 
utilized in other motion measurement schemes as well. 

The expression (2) related V(s) to the global motion parameters oj and n, and 
the shape of the curve C — (x(s), y[s), z(s)). The relationship between V(s) and C 
is quite simple. If we map the projected two-dimensional velocity vectors along the 
curve to a common origin in velocity space, their endpoints map out a scaled, 90° 
rotation of the projected image curve (x(s), y(s)), with an additional distortion along 

one direction. This distortion is directed perpendicular to the axis n = [ni,n2,0] r , 
and is scaled by the z component of the curve, z(s). This relationship implies, for 
example, that if we have a smooth curve in motion, it must generate a smoothly 
varying velocity field. The real world consists predominantly of solid objects, whose 
surfaces are generally smooth compared with their distance from the viewer. Thus, 
intuitively, we seek a velocity field which is consistent with the constraints we derive 
from the changing image, and which varies smoothly. A single solution might be 
obtained by finding the velocity field which varies as little as possible. A similar 
argument was used by Horn and Schunck [35] to motivate the use of a smoothness 
constraint for the optical flow computation. In our case, we seek a smooth velocity 
field along a contour. 

To achieve this, we need some means of measuring the variation in velocity 
along a contour. There are various ways in which this could be done. For example, 
we could measure the change in direction of velocity as we trace along the contour! 
Total variation of the velocity field could then be defined as the total change in 
direction over the entire contour. A second definition involves measuring the change 
in magnitude of velocity along the contour. This leads to a velocity field solution for 
which speed is as uniform as possible along the contour. Finally, we could measure 

the change in the full velocity vector, , incorporating both the direction and 
magnitude of velocity. 

In order to define the variation of the velocity field more formally, first recall 
the decomposition of velocity into components tangent and perpendicular to the 
curve: 

Y(s) = v T (s)u T (s) _j_ u-L(s) u -L( g ) (!) 

u T (s), u-L(s) and v-L(s) can be measured directly from the changing image. u T (s) 
is unknown, and must be recovered in order to compute the velocity field V(s). 
Aside from knowing u-L(s) everywhere along the curve, there may also be points at 
which the direction and magnitude of velocity, and hence both v-L(s) and v^(s), 

are known. In addition, the direction of velocity alone, and hence the ratio 

V‘ (s) ’ 

may be known at points on the curve, for example, where == 0 (Section 4.2). 

We can now consider a more formal means for measuring the variation in the 
velocity field. Mathematically, this can be accomplished by defining a functional 
0, which maps the space of all possible vector fields (along the contour), Y, into 
the real numbers: 0:V $ft. This functional should be such that the smaller the 
variation in the velocity field, the smaller the real number assigned to it. Two 
candidate velocity fields may then be compared, by comparing their corresponding 
real numbers. This raises the question of what functional should be used to measure 
the variation of a velocity field. In the remainder of this section, we will evaluate 



a set of possible functionals, based on the three measures of variation that we 
Jfci previously presented informally: (1) variation in V(s), (2) variation in the direction 

of velocity, and (3) variation in the magnitude of velocity, all with respect to the 
curve. 

(1) Variation in V(s) 

A scalar measure of the local variation of V(s) with respect to the curve is given by 

shown in Figure 11a. Two nearby velocity vectors along the image curve are 

translated to a common origin in velocity space, where the vector is shown 

with a dotted arrow. For convenience of notation, we will omit the argument to 



Figure 11. Measuring variation in the velocity field (a) Change in the full velocity 
vector (b) Change in direction of velocity 


V(s), writing |^|. A measure of the total variation of the velocity field along the 
curve may then be given by the functional: 

e(V) = /|f|rfa 
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We may also consider variations on this functional, involving higher order derivatives 
or higher powers, such as: ’ 

0 (V) = J\^\ds or 0(V) = / \~\ 2 ds 
J ds 2 J ds' 

(2) Variation in Direction 

Let the direction of velocity be given by the angle p, measured in the clockwise 
direction from the horizontal, as shown in Figure 11a. In Figure lib, §£, for two 
nearby velocity vectors along the image curve, is shown in velocity space. Total 
variation of direction along the curve could be given by functionals such as the 
following: 

0 ( V ) = /i^i rfs 

or variations involving higher order derivatives, or higher powers. 

(3) Variation in Magnitude 

Finally, we could measure the change in magnitude of velocity alone using 
functionals such as: 

w-/£*<• 

Again, we could also consider variations on this measure. 

The functional that we use to measure smoothness may also incorporate a 
measure of the velocity field itself, rather than strictly utilizing changes in the 
velocity field along the curve. For example, we could incorporate a term which is 
a function of |V|. This might be useful if we sought a velocity field which also 
exhibits the least total motion. In addition, the functional could become arbitrarily 

complex in its combination of |^-|, |^~|, or higher order derivatives. 

We have at least three means of evaluating these measures of smoothness. 
From a mathematical point of view, there should exist a unique velocity field 
which minimizes our particular measure of smoothness; this requirement imposes 
a set of mathematical constraints on our functional. Second, the velocity field 
computation should yield physically plausible solutions. Finally, if we suggest that 
such a smoothness constraint underlies the motion computation in the human 
visual system, this minimization should yield a velocity field consistent with human 
motion perception. 

An examination of these smoothness measures from a physical and mathematical 
point of view suggests that a measure involving the full velocity vector, such as 

0 ( v ) — / l;^f| 2 ds, is most appropriate for the velocity field computation [37]. Of 
particular importance are the mathematical properties of this functional. It can 
be shown that, given a simple condition on the constraints that we derive from 
the image, there exists a unique velocity field which satisfies our constraints, and 

minimizes / |^-| 2 ds. This condition is almost always satisfied by our initial motion 
measurements. To obtain this result, we take advantage of the analysis used by 
Grimson [70] for evaluating possible functionals for performing surface interpolation 
from stereo data. The basic mathematical question is, what conditions on the form 
of the functional, and the structure of the space of velocity fields, are needed to 


guarantee the existence of a unique solution? These conditions are captured by the 
following theorem (see also [71]): 

The or e m: Suppose there exists a complete semi-norm 0 on a space of functions 
H, and that 0 satisfies the parallelogram law. Then, every nonempty closed 
convex set E Q H contains a unique element v of minimal norm, up to an 
element of the null space. Thus, the family of minim,al functions is 

where 

S — {v —w\w£E}f]M 
and M is the null space of the functional 

M = {u j 0(u) = 0}. 


It can be shown that the functional {/ |^y| 2 ds}5 is a complete semi-norm, which 
satisfies the parallelogram law. Second, the space of all possible velocity fields, 
which satisfy the constraints derived from the image, is convex. It then follows 
from the above theorem that this space contains a unique element of minimal norm, 
up to an element of the null space. Since our smoothness measure is non-negative^ 
minimizing {/ |^| 2 ds}* is equivalent to minimizing / |^| 2 ds. 

^yThe null space in this case is the set of constant velocity fields, since 
/||^| 2 ds = 0 implies ||^| = 0 everywhere, which implies V(s) constant. Suppose 

we have a point (x(s z ), t/(s*)) on the curve, where u-L(sj) is known. This measurement 
constrains the velocity F(s t ) to lie along a line parallel to the tangent of the curve 
at this point, as shown in Figure 12. Suppose we have a velocity field which is 



Figure 12. Uniqueness of the velocity field, (a) Constraint provided by a single 
measurement (b) The constraint imposed by two measurements 

consistent with this measure. We can now only add a uniform translation component 
along the direction of this line, and still obtain a velocity field consistent with 
this local measure. If v-L(s) is known at a second point (x(s t ), y(s;)), for which the 


17 


direction of the tangent is different (see Figure 12b), then we can only add a uniform 

f'-. translation component along this second direction, and still obtain a velocity field 

consistent with v-L(s ; ), However, we cannot add a uniform translation to the entire 
velocity field, which is consistent with both local measurements. Thus, we conclude 
the following: If u-L(s) is known at two points, for which the orientation of the 
curve is different, then there exists a unique velocity field which satisfies the known 

velocity constraints and minimizes /|~pr| 2 ds. An extended straight line will not 
yield measurements for two different orientations, but in all other cases, there will 
be sufficient information along a contour to guarantee a unique solution to the 
velocity field. 

We can apply the constraint of least variation and compute a projected 
two-dimensional velocity field for any three-dimensional surface, whether rigid or 
non-rigid, undergoing general motion in space. If we measure the variation in 
the J' U , 11 velocit y vect or along a contour in the image, using a functional such as 
/|%d ^ s , we are guaranteed that there exists a unique solution to the velocity 
field computation that minimizes this variation. While it is not yet clear that the 
general smoothness constraint, or the particular measure /|§| 2 ds, is the most 
appropriate for the motion computation, it is important that this measure satisfies 
certain essential mathematical requirements, that the other measures do not. For 
example, the use of a functional incorporating only a measure of velocity direction, 
which will attempt to make the local velocity vectors as parallel as possible, does 
not yield functionals which are semi-norms, and consequently, does not lead to a 
unique velocity field solution, for a scheme to underly the motion computation in 
the human visual system, it is essential that it be mathematically well-founded 

rh We should note that an advantage to applying the smoothness constraint along 

contours is that the minimization of variation in the velocity field is performed 
along one-dimension, rather than over two dimensions, as in the case of Horn and 
Schunck’s computation of the optical flow [35]. Secondly, to apply the smoothness 
constraint over an area of the image, it is necessary to specify a neighborhood 
size, within which constraints will be combined, and smoothness imposed on the 
velocity field. Unless we can define surface boundaries prior to the velocity field 
computation, specifying an appropriate area of the image can be difficult. In 
general, the extent of contours is more highly correlated with single surfaces. The 
smoothness constraint can be applied to single contours, reducing the problem 
of integrating motion measurements across object boundaries. Finally, there exist 
several standard algorithms for the solution of optimization problems such as this 
(see, e.g. [37]). 

4.4. Deriving Additional Constraints from the Image 

In the previous section, we used two sources of constraint on the velocity field 
computation. From the image, we utilized a single curve at a particular moment in 
time, together with the instantaneous measurements of the perpendicular component 
of velocity along the curve. As a second source of constraint, we computed the 
velocity field consisent with these image constraints, which exhibited the least 
variation along the curve. Additional constraints can be derived from the image if 
we do not restrict ourselves to the use of instantaneous measurements; for example, 
we may utilize a second curve, at some time later. If the time interval is small’ 
then the displacement of points along the curve will also be small. We can then 
require that each point on the first curve project to a point on the second, with 
a velocity consistent with the instantaneous perpendicular component of velocity 
this assumes that velocity is constant over the time interval separating the 
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two curves. In addition, we could still compute the velocity field which exhibits the 
least variation. 

This approach may yield a simpler, more robust algorithm for the velocity 
field computation, because it utilizes more constraint from the image. However, it 
has the disadvantage that we may not be able to obtain the theoretical uniqueness 
results that were possible when we considered the perpendicular components of 
velocity as a sole source of constraint. A simple example, in which the velocity 
field solution is not unique is shown in Figure 13. Suppose we are given the initial 
constraints shown in Figure 13a. The arrows indicate the perpendicular components 
of velocity along the first curve, and the dotted line indicates the second curve. 
There are two velocity field solutions consistent with these constraints, shown in 
Figures 13b and c, corresponding to the two directions of rotation of the circle. 
Both velocity fields exhibit the same total variation. In general, theoretical results 
on uniqueness may be more difficult to obtain for this approach to the velocity field 
computation. The use of instantaneous motion measurements alone, together with 
the additional smoothness constraint, as discussed in Section 4.3, would yield the 
velocity field given by the vectors in Figure 13a, corresponding to pure expansion 
of the circle. The additional constraint of the second curve leads to a different 
solution. 





Figure 13. Ambiguity of the velocity field, (a) The initial constraints (b) Rotation of 

the circle to the left (c) Rotation to the right 

The availability of the second curve may simplify the velocity field computation 
in the following way. The perpendicular component of velocity, measured at a 
point p on the first curve, constrains the velocity vector at p to project to a point 
along the line l in the second frame, shown in Figure 14. If in addition, p must 
project to a point q on the second curve, possible candidates for q may be given by 
the intersection of l with the second curve. In practice, there will be error in the 
measurement of v-L(s), and v-L(s) may not be constant over small time intervals. 
As a consequence, we should consider a band in the second frame, to which p must 
project. Candidates for q are then given by the intersection of this band with the 
second curve, shown in Figure 14. If the curve has fairly high local curvature, or 
undergoes rotation, then this intersection alone provides considerable constraint on 
the velocity field. However, in the worst case of an extended line undergoing pure 
translation, the second curve offers limited additional constraint. The computation 
of a precise velocity field requires further analysis of constraints derived from the 
image, together with additional assumptions. We are presently exploring algorithms 
which utilize the smoothness constraint for this subsequent computation. 


Figure 14. Use of the constraint provided by a second curve 


4.5. Summary 

We have considered various additional constraints which may be used in the 
computation of the velocity field from initial motion measurements derived from 
the changing image. These constraints range from the restricted assumption of 
pure translation to the general constraint of smoothness of the velocity field, which 
allows for the arbitrary movement of rigid or non-rigid objects in space. The use of 
different constraints results in considerable variation in the classes of motion which 
may be analyzed, the type of algorithm, and the extent of theoretical analysis 
required to formulate a well-defined computational problem. In analyzing these 
constraints, we have so far restricted ourselves to addressing purely computational 
issues. In the next section, we discuss implications for the biological computation of 
motion. If the human visual system does in fact compute a detailed velocity field, 
it is likely to use as much constraint as possible from the changing image, together 
with the least restrictive additional constraints as necessary, to compute a unique 
velocity field. 


5. Some Implications Concerning the Biological Computation of Motion 

In this section, we summarize the above discussion by presenting a list of the basic 
proposals that have been made for the computation of motion. In addition, we 
discuss some of the implications of these proposals for the human visual system. 

(i) An underlying assumption of this work is that the local velocity field is explicitly 
computed and represented. For the human visual system, the idea that there exists 
an explicit computation of motion, which is different from the description of motion 
that could be provided by initial motion detectors, can be motivated by simple 






examples. In Figure 15, we show a circle and square undergoing pure translation. 
Initial motion measurements provide the component of motion in the direction 
perpendicular to the local orientation of intensity changes in the image, shown 
in Figure 15a. Our perception of the movement of the figures is pure translation, 
indicated by the set of velocity vectors in Figure 15b. A third example is that of 
the rotating and translating curve of Figure 9. While it is not clear whether we 
are capable of explicitly representing the local velocity field around the contour, 
we do perceive the movement as the rotation and translation of a rigid curve. Such 
an interpretation is not explicit in the initial motion measurements. For tasks such 
as the detection of a sudden movement, or separation of objects on the basis of 
differential motion, a precise local velocity field may not be necessary. However, 
to compute three-dimensional structure from motion, a more detailed computation 
of the velocity field, or an explicit correspondence of elements between frames, is 
required. 



(a) 



(b) 

Figure 15. Computing the local velocity field, (a) The initial motion measurements; 

(b) The velocity field corresponding to translation 

(ii) The analysis of motion has been separated into two distinct stages; first, the 
measurement of motion, and second, the use of motion for tasks such as object 


21 














segmentation and structure from motion. This raises the question of whether the 
interpretation of three-dimensional structure can influence the computation of 
the two-dimensional velocity field. For example, does the assumption of rigidity 
examined in Ullman’s work [13], enter into the velocity field computation? 

sychological experiments [13] suggest that the long range motion correspondence 
is not influenced by the intrepreted three-dimensional structure of a single view. 
The short range process may be similar. 

(iii) We support the idea that there exists two processes for analyzing motion, 
corresponding to Braddick’s long range and short range processes. We suggest that 
the long range process is based on a token-matching scheme, while the short range 
process is intensity-based. If this view is valid, it raises the following questions. How 
do the long range and short range processes interact? Do subsequent computational 
tasks, such as object segmentation or structure from motion utilize the results of 
one or the other process? The work of Petersik [72] suggests that the long range 
process may be crucial to the recovery of structure from motion. Finally, it is 
interesting that neurophysiological studies have revealed many units which are 
responsive to, or selective for stimuli undergoing continuous motion. Little is known 
about the long range process at the neurophysiological level. One obvious question 
is, where in the visual system can apparent motion phenomena he observed in the 
response of single units? Motion sensitive units (for example in areas VI and STS 
or MT of the monkey) could be tested for apparent motion response by flashing 
bars at stationary locations using relatively wide separations (that is, wider than 
the largest size of simple cells at the tested eccentricity). If long range motion 
units can be identified, it may become possible to go a step further and investigate 
the relationship between the psychophysically established correspondence rules and 
their neurophysiological correlates. 

(*y) We suggest that the initial stage of motion analysis consists of the measurement 
of the perpendicular component of velocity along zero-crossing contours. This 
can be examined through neurophysiological and psychological studies. In regard 
to neurophysiology, are the class of directionally-selective simple cells detecting 
the motion of zero-crossings in their input from the LGN? This is now under 
investigation [Richter, personal communication]; initial results tend to support this 
claim. Psychophysical experiments can test whether perceived motion is consistent 
with the motion of zero-crossings. 

(v) We propose that the local motion measurements are then integrated along zero¬ 
crossing contours. Again, this may be explored through both neurophysiology and 
psychology. If the motion integration problem is fundamental to motion analysis, 
one may expect to find neural mechanisms within the visual system that are involved 
in this task. Most of the motion sensitive units studied so far do not seem suitable 
for the integration stage. Motion selective cells in the primary visual cortex of the 
cat and the monkey respond primarily to edges and bars. To activate such a unit the 
stimulus must have the preferred orientation, and move in the preferred direction. 

«• con ^' r ^' s ^>. promising candidates for the integration phase would dissociate the 
effects of orientation and direction of movement, ideally exhibiting specificity for 
direction of motion but not for orientation. Furthermore, the direction specificity 
of such a unit is expected to depend on the range of orientations spanned by 
the stimulus. There are indications for the possible existence of such units in 
the posterior bank of the superior temporal sulcus of the rhesus monkey [73 . For 
psychophysical experimentation, there are at least two questions; first, is the motion 
that we perceive forced to be consistent at least with the sign of the local motion 
measurements along zero-crossing contours, or can it be overridden, for example by 
the long range process, or by the history of the motion? Second, if the integration 
does take place, are measurements combined over neighborhoods in the image, or 




along contours? Wallach’s [74] demonstrations suggest that the integration may 
take place along contours. 

(vi) Additional assumptions are required for the motion integration problem. 
Regarding the human system, we may first ask what constraints are derived from 
the changing image. Does the human visual system strictly utilize instantaneous 
measurements of velocity, or is a second curve, at some small time interval later 
also used to constrain the velocity field. Do we utilize an additional constraint on 
smoothness of the velocity field, as described here? A constraint such as smoothness 
may be the least restrictive constraint which allows objects to move freely in space 
ana deform, but which still allows for the computation of a unique velocity field. 
Psychophysical experimentation is necessary to determine whether the velocity field 
that we perceive is the smoothest one possible. Both the short and long range 
processes face the fact that in general, the motion of elements is not specified 
uniquely by information in the changing image; do the additional assumptions 
governing the computation of velocity or correspondence differ in the two processes 
or do they differ only in the constraints that are utilized from the changing image? 

(vn) Finally, the motion measurement problem has some implications for the 
interpretation of structure from motion. It has been shown [17] that three- 
dimensional shape can be recovered locally, from the instantaneous velocity field. 
The interpretation is sensitive, however, to small errors in the measured velocity. 
In light of the inherent difficulties in measuring the velocity field precisely, recovery 
methods that rely solely on the instantaneous velocity field appear unlikely. For 
the reliable recovery of three-dimensional structure from motion, processes that 
integrate motion over time are probably required. 
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